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Abstract —Mrs. Gerber’s Lemma lower bounds the entropy at 
the output of a binary symmetric channel in terms of the entropy 
of the input process. In this paper, we lower bound the output 
entropy via a different measure of input uncertainty, pertaining 
to the minimum mean squared error (MMSE) prediction cost 
of the input process. We show that in many cases our bound is 
tighter than the one obtained from Mrs. Gerber’s Lemma. As 
an application, we evaluate the bound for binary hidden Markov 
processes, and obtain new estimates for the entropy rate. 


I. Introduction 

Mrs. Gerber’s Lemma m lower bounds the entropy of the 
output of a binary symmetric channel (BSC) in terms of the 
entropy of the input to the channel. More specifically, if X G 
{0,1}" is an n-dimensional binary random vector with entropy 
iL(X), Z G {0,1}" is an n-dimensional binary random vector 
with i.i.d. Bernoulli(a) components, statistically independent 
of X, and Y = X © Z, Mrs. Gerber’s Lemma states that 

-H{Y) >h(a*h-^(- 
n \ \n 

where h{p) = —plog(p) — (1 — p)log(l — p) is the binary 
entropy function, is its inverse function restricted to 

[0, |] and a * b = a{l — b) + b{l — a) denotes the binary 
convolution between two numbers a,b G [0,1]. For X i.i.d., 
the inequality ([T]i is tight. 

The inequality ([T]l is in fact a simple consequence of the 
conditional scalar Mrs. Gerber’s Lemma, which states the 
following: If U is some random variable, X\U = u ^ 
Bernoulli(P„), and Z ^ Bernoulli(a) is statistically inde¬ 
pendent of {X, U), we have that 

H{X®Z\U)>h{a*h-UHiX\U))), (2) 

or alternatively, 

Eh{a * Pu) > h {a * h-^ {Eh{Pu))). (3) 

Since the publication of m, many extensions, generaliza¬ 
tions and results of a similar flavor have been found, see 
e-g-^ El-Q. In this paper, we derive a lower bound on entropy 
of the output Y in terms of the minimum mean squared error 
predictability of the input X, as we define next. 

Let TT be some permutation of the coordinates {1, 2,..., n}. 
We define the minimum mean squared error (MMSE) pre- 
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dictability of a binary vector X w.r.t. the permutation tt as 
MMSE,r(X) 

n 

— y^MMSE (X^(i)|X^(i_i),X7r(*_2), • ■ - ,^7r(i)) 

n 

^ ^ ^ ^ Var 1) 5 ? • ■ • ? ^7r(i))) 

i=l 

n 

^Y.^{p:{i-pn), ( 4 ) 

i=l 

where the random variable P^ is defined as 

Pi — Pr (-^7i-(i) = I\^Tr(i-l)y^Tv(i-2), ■ ■ ■ ,-V,r(l)) • (5) 

The worst-case MMSE predictability of a binary vector X is 
defined as 

MMSE(X) 4 maxMMSE^(X). (6) 

TT 


Our main result is the following. 

Theorem 1: Let X, Z be two statistically independent n- 
dimensional random binary vectors, where X is arbitrary and 
Z is i.i.d. Bernoulli(a). Let Y = X © Z. Then 


-H(Y) > + (7) 

n n 


with equality if and only if X is memoryless with Pr{Xi = 
1) G {0, 4,1} for every i = 1,... ,n. 

In Section [n] we prove an MMSE version of the conditional 
scalar Mrs. Gerber Lemma (|2ll, which implies Theorem [T] as 
a simple corollary. In Section UII] we derive several MMSE- 
based extensions of Theorem [T] including a lower bound on 
H{Y) for the setting where Z is not i.i.d. as well as an 
upper bound on H{Y). Section ITVl compares our new bound 
to Mrs. Gerber’s Lemma. As an application of Theorem [T] 
in Section |V] we develop a lower bound on the entropy rate 
of a binary hidden Markov process, which is shown to be 
considerably stronger than Mrs. Gerber’s Lemma in certain 
scenarios. Eurthermore, our MMSE-based scalar lower bound 
derived is combined with a bounding technique developed 
in m to obtain new estimates on the entropy rate of binary 
hidden Markov processes. 


11. Proofs 

Mrs. Gerber’s Lemma is proved by first deriving the condi¬ 
tional scalar inequality (O and then invoking the chain rule for 
entropy and convexity of the function g{u) = h{a * h~^{u)) 
to arrive at ([T]), see H], I?). Similarly, we begin by proving 
an MMSE version of (|2]) below, from which Theorem [T] will 
follow as a simple corollary. 
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Lemma 1: Let t7 be a random variable and \et X\U = u ^ 
Bernoulli(Pu). Denote the MMSE in estimating X from U 
by 

MMSE(X|C/) = E (Var(X|C/)) = E {Pu{l - Pu)) ■ (8) 

Let Z ^ Bernoulli(a) be statistically independent of (X, U). 
Then 


H{X © Z\U) > h{a) + (1 - h{a)) 4MMSE(X|17), 

with equality if and only if G {0, 1} for any value of u. 

Proof: Since Z is statistically independent of {X, U) we 

have 


H{XiSZ\U)=Eh{Pu*a). (9) 

Let Vu = Pu — ^ and note that 

Pu * a = 

= l + (l-2a)Vu. (10) 

Recall that the Taylor series expansion of the binary entropy 
function around i is 


+ Vu]{l-a) + a[--Vu 


2 2 




^2fc(2fc-l)'^ ’ 


and therefore, by (fTOl i we have 


h{Pu * a) = 1 - ^ 


log(e) 

^ 2k{2k-l) 


{l-2aY^{2Vu) 


\2k 




k=l 


2k{2k- 1) 


( 12 ) 


= i-4V^ + w3(i-J2 


= l-4V^+W,j-h^-+ 2 

= 1 - 4L^ (1 - hia )), 


log(e) 

2fc(2fc- 1) 
1 1 - 2a 


(1 - 2a) 


2k 


(13) 


where (HSli follows from the fact that \2Vu\ < 1, and is 
satisfied with equality if and only if Vu G { — ^,0, ^}, which 
implies that Pu G {0, 1}. Substituting (fT^ into ® gives 


H{X © Z\U) > 1 - (1 - h{a)) 4E(yJ) 

= 1 _ (1 _ h{a)) 4E - Pu^ 

= h{a) + (1 - h{a)) 4E {PuY - Pu)), 


as desired. ■ 

Remark 1: Note that the only property of the binary entropy 
function used in the proof above is that all coefficients of 
(nonzero) even order in its Taylor expansion around i are 
negative, whereas all odd coefficients are zero. It follows that 
for any function g : [0,1] K. whose Taylor expansion around 
i is of the form 

5(^ + 1) =Co-^Cfc(p)^^ 

^ k=l 


where Cfc > 0 for all positive k we have 

Ep (a * Pu) > gio) + (cq — g{ct)) 4MMSE(2f|(7). 

Theorem [T] now follows as a straightforward corollary of 
Lemma [T] 

Proof of Theorem |7} By the chain rule for entropy, for 
any permutation tt we have 

n 

H{Y) = J2h (t;(,) |y^(,_i),..., r^(i)) 

n 

= H (^7r(2) ® ^7r(z)l^7r(i-l)5 ■ • ■ 5 ^7r(l)) (l^) 

i=l 
n 

> ^ h{a) + (1 - h{a)) 4MMSE , • ■ •, L^d)) ■ 

i=l 

(15) 

Clearly 

MMSE (X,(.)|L^d-i),...,L^(i)) 

> MMSE |L;r(i-l); • ■ • ; ^^(1), ■Z^7r(i-1) J • ■ ■ ; ^ir(l)) 

= MMSE ..., Z^(i_i '^,..., -2^77(1)) 

= MMSE |X^(7_1),..., X^d)) , (16) 

where the last equality follows since the random variables 
{Zi}^^-^ are statistically independent of {Xi}2^-^^. Thus, for 
any permutation tt we have 

H{Y) > nh{a) 

n 

+ (1 - h{a)) 4 ^ MMSE (X^^) |X^d_i),..., X^d)) , 

(17) 

and O follows by maximizing (ini) w.r.t. TT. By 
Lemma [T] the inequality (O is tight if and only if 
Pr (x,r(i) = l|X,r(i-i), ■ ■ •, Y^{ 1 )) G {0, 1} for every i and 

every realization of the vector (L;r(i-i)! • ■ ■ whereas 

for 0 < a < 1 the inequality (fTST i is tight if and only if X 
is memoryless. Thus, (|7]i holds with equality if and only if 
X is memoryless with Pr(Xi = 1) G {0,^,1} for every 
2 = 1,..., n. ■ 


III. Extensions 


In this section we derive several simple extensions of our 
main results. Since the proofs are quite similar to those of 
Lemma [T] and Theorem [T] we omit the full details and only 
sketch the differences instead. 

We begin with a straightforward extension of Theorem [T] 
to the conditional entropy H{Y\W) where X may depend on 
W, while Z and W are statistically independent. 

Theorem 2: Let W be some random variable, and let X, Z 
be two n-dimensional random binary vectors, where X is 
arbitrary and Z is i.i.d. Bernoulli(a). Assume that (X, IE) 
is mutually independent of Z, and let Y = X © Z. Then 




n 
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with equality if and only if X|iy = w is memoryless with 
Pr(Xi = 1\W = w) € {0, i, 1} for every i = 1,..., n and 
every w. 

Proof: The proof is omitted as it follows the exact same 
steps as in the proof of Theorem [T] where the conditioning on 
W is added where relevant. ■ 

Next, we show that our lower bound can also be extended to 
the case of a binary noisy channel with memory. To that end, 
we first need to derive a simple generalization of Lemma [T] 

Lemma 2: Let U = {T, W), where T and W are statisti¬ 
cally independent. Let X and Z be conditionally independent 
given U, such that X\U = {t,w) Bernoulli(Pt) and Z\U = 
{t,w) - Bernoulli(a^). Let MMSE(X|[/) = MMSE(X|T) 
be as defined in ®. Then 


H{X © Z\U) > H{Z\W) + (1 - H{Z\W)) 4MMSE(X|r), 
with equality if and only if Pt G {0, 1} for any value of t. 

Sketch of proof: The proof follows the same lines as 
the proof of Lemma [T] Since T and W are statistically 
independent, we have iT(X (B Z\U) = E/i(Pr * aw)- By (HbT i 
we have that 

h{PT * aw) > 1 — 4 — Pj^ (1 — h{aw)), 

We therefore have 

Euh{PT * aw) > IE(7 — 4 — Pj^ (1 — h{aw)^ 

= 1 — 4Et (1 — ^wh{aw )), 

and the lemma follows by recalling that 4 Et Pt)^ = 
1 - 4MMSE(X|T) and that Ewh[aw) = h\z\W). ■ 

As a simple corollary, we obtain the following. 

Theorem 3: Let X, Z be two statistically independent n- 
dimensional random binary vectors, and let Y = X© Z. Then 


H{Y) > max|p(Z) +4MMSE^(X) 

n 

4 ^ ^ H I ^7r(2 —1) ; • ■ • 7 

i^l 

•MMSE |, 

with equality if and only if Z is memoryless and X is mem¬ 
oryless with Pr(Xi = 1) G {0, i, 1} for every i = 1,... ,n. 

Proof: By the chain rule for entropy, for any permutation 
TT we have 


H{Y) =Y,H (Y.(.) IY.(._i),..., Y.(i)) 

i=l 

n 

2_1) , . . . , , -^- 77 ( 2 — 1 ) ; ■ • ■ ; -^ 77 ( 1 )) 

(18) 


2=1 


where the random variables 


rpi 

TT 


(^7r(t— 1 ) 7 ■ • ■ 1 ^'7r(l) ) 
(-^7r(t—l)j ■ • ■ 7 ■^7r(l)) 


are statistically independent, and and •^ 7 r(i) are con¬ 

ditionally independent given {T^,Wl), since X and Z are 
statistically independent. The inequality (fTsT i is tight if and 
only if X and Y are both memoryless. Now, by Lemma |2] we 
have that 


> i?(Z^(i)|VL;) + (1 - iT(Z,(,)|lY;)) 4MMSE (X^(,)|t;) ■ 


Summing over i gives the desired result. ■ 

A simple consequence of Theorem [3 is that if X and Z are 
statistically independent binary symmetric first-order Markov 
processes with transition probabilities qi and q 2 , respectively, 
then ^iT(Y) > h{qi) + 4^2(1 - g 2 )(l - Hqi))- This bound 
uses the identity permutation tt = {1,... ,n). We note that a 
more clever choice of tt, as used in Section |V] can result in a 
better bound. 

We end this section by deriving an upper bound on H{Y) 
in terms of the best-case MMSE predictability of X from Y 

n 

MMSE (XIY) 4 mm^ MMSE (X,(,)|Y,(,_i),..., Y,(i)) . 

i=l 


To that end, we first upper bound H{X (B Z\U) in terms of 
MMSE(X|P). 

Lemma 3: Let U be some random variable and let X|P = 
u ^ Bernoulli(Pu). Let Z ^ Bernoulli(a) be statistically 
independent of {X, U). Then 


H{X®Z\U) < h 


^ -4MMSE(X|P) j , 


with equality if and only if | P^^ — ^ | does not depend on u. 

Proof: Define the function Q{t) = h + Vt) and note 
that it is concave over [0, |]. By (|9l) and (flOl) we have 


H{X © Z\U) = Eh[l- + {l-2a) [Pu- 


1 


= Eh{- + \ {l-2a)^Pu-- 


<h I - + 


\ 


E 


(1 - 2a)2 Pu - 


1 


= h { - + {1 - 2a) JeI - - P^{1 - Pu) 


= h 


1 1 - 2a 


\/l-4MMSE(X|P) 


(19) 


as desired. 
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Remark 2: In the special case where a = 0, Lemma [3 
reduces to the inequality 


lKh{Pu) < h 




which was obtained by Wyner in |[8] eq. (3.11)] 

The function Fa{x) = h \/l — Ax) is concave 

and monotone non-decreasing for x £ [0, -j] and any value of 
a £ [0, ^]. Combining this with (fl4li and with Lemma |3] gives 
the following. 

Theorem 4: Let X, Z be two statistically independent n- 
dimensional random binary vectors, where X is arbitrary and 
Z is i.i.d. Bernoulli(a). Let Y = X 0 Z. Then 


-H{Y) < h 
n 



1 - 2a 


1-4 


MMSE (XIY) \ 

^ / 


with equality if and only if X is i.i.d. 


IV. Comparison with Mrs. Gerber’s Lemma 

In this section we compare the performance of our MMSE- 
based bound to Mrs. Gerber’s Lemma. First, we consider 
the family of random vectors with fixed MMSE(X). Clearly, 
the bound from Theorem [T] is the same for all members of 
this family. However, the entropy iT(X) may vary within 
the family, and hence applying Mrs. Gerber’s Lemma results 
in a range of bounds, which can be juxtaposed with the 
bound of Theorem [T] Similarly, we fix H (X) and juxtapose 
Mrs. Gerber’s Lemma with the range of bounds obtained by 
applying Theroem[T] 

For the special case of a = 0, Theorem [T] reads 

H(X) > 4MMSE(X). (20) 

and Theorem |4] reads 

Denote the RHS of Q by 

\ n 

and the RHS of 01 by 

NEW (a, Px) = Ha) + (1 - /i(a)) 4 ^^^E(X) ^ 

n 

By dSTTl and (l2Ql > it follows that 



MGL(a,Px) = h 


Figure [Tal depicts the lower and upper bound on MGL (a, Px) 
from as a function of MMSE(X) along with 

NEW(a,Px), for a = 0.11. It is seen that for all val¬ 
ues of MMSE(X) our bound is quite close to the upper 
bound on MGL(a,Px), and is often significantly stronger 
than the lower bound on MGL (a,Px). In general, for small 
values of a, NEW (a,Px) will be close to the lower bound 
on MGL(a,Px) and will approach the upper bound on 
MGL(a, Px) as a increases. Figure flbl demonstrates this 
phenomenon for 4MMSE(X) = 0.5. 

Equivalently, by (l20l) and (l2TI) . we also have that 


4nh-^ 




< 4MMSE(X) < P(X). 


(23) 


In fact, (|2^ holds for 4MMSE7r(X) with any permutation tt, 
and implies 


Haw - Mo)4/.- (^) (1 - (^)' 

< NEW (a, Px) < HH + (1 - h{a)H{X) (24) 


Figure |2a] depicts the lower and upper bound on 
NEW(a,Px) from (l24l l as a function of iT(X) along with 
MGL(a, Px), for a = 0.11. It is seen that for all values 
of iT(X), MGL(a,Px) is quite close to the lower bound 
on NEW(a, Px), and is often significantly weaker than the 
upper bound on NEW(a, Px). In general, for small val¬ 
ues of a, MGL(a, Px) will be close to the upper bound 
on NEW(a, Px) and will approach the lower bound on 
NEW(a,Px) as a increases. Figure l2b] demonstrates this 
phenomenon for P(X) = 0.5. 


V. Application: Lower Bound on the Entropy Rate 
OE A Binary Hidden Markov Process 

In this section we apply Theorem [T] to derive a simple 
lower bound on the entropy rate of a binary hidden Markov 
process. Let Xi ^ Bernoulli (^) and for m = 2,3,... let 
Xm = Xm -1 © Wm where {Wm} is an i.i.d. Bernoulli((j') 
process statistically independent of Xi. Clearly, the process 
{Xn} is a symmetric first-order Markov Process. We define 
the hidden Markov process Yn = X„ © where {Zn} is 
an i.i.d. Bernoulli(a) process statistically independent of the 
process {Xn}- Our goal in this section is to derive a lower 
bound on the entropy rate of {Y„} defined as 

H{Y) ^ lim ( 25 ) 

n—¥oo Tl 

One very simple bound can be obtained by noting that 
H{X) = HH ^nd applying Mrs. Gerber’s Lemma ([T]i which 
gives 


H{Y)>h{a*q). (26) 

We will see that in many cases our MMSE-based bound from 
Theorem [T] provides tighter bounds. 
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(b) MMSE(X) = 0.5 


Fig. 1. Comparison between the lower and upper bounds on MGL (a, Px) from i22\ and NEW (a, Px.)- 




(b) P'(X) = 0.5 


Fig. 2. Comparison between the lower and upper bounds on NEW {a, Px) from 1241 and MGL (a,Px). 


Note that for any tt it holds that MMSE(X) > MMSE 7 r(X) 1, 3,..., 2^ — 1, we have that 
and therefore Theorem [T] implies that for any choice of tt 


-H{Y) > h{a) + (1 - h{a)) . ( 27 ) 

n n 


Thus, in order to apply Theorem [T] we need to choose some 
TT and evaluate MMSE^(X). A trivial choice is the identity 
TT = {l,2,...,n}, for which = 7(1 — q) and 

our bound yields H(Y) > h{a) + (1 — h{a))^q{l — q). It 
is easy to see that this choice of tt yields the lower bound 
on NEW (a,Px) from (l24l) . and is therefore strictly weaker 
than (l26l l. We would therefore like to choose a permutation 
TT that will incur a higher value of MMSE 7 r(X). Assume that 
logn is an integer. A natural candidate is the following 


MMSE |^7r(i—1) 1 ^7r(l — 2) 1 ■ ■ ■ j ^7r(l)) 

> MMSE n yi:n+ n ") 

= MMSE ^) 

4 MMSE (J), 


/ n n in n 

in 

5n 

7n n 

in 

r’ 2’ 4’ 4 ’ 8 ’ 

8 ’ 

8 ’ 

8 ’ 16’ 

16’ 


(28) 


With this choice of tt we have that if Tr(i) = for r = 


where the inequality follows from the Markovity of {X„} 
which implies that the conditional distribution of given 
multiple samples from the past and the future of the process 
depends only on the nearest sample from the past and the 
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MMSE^(X) ^ ‘^12'= 


nearest sample from the future. We therefore have 

'°S" 1 ofc , „ . 

logn 

^ k^l 
logn-1 

= - ^ 2-*MMSE (2*) . 


t=0 


.Pi (^n+.^; ^n—£).fo — 


It therefore follows that 


p! (1 - Pi? 

1 — P2e 1 — P2e 


MMSE(f) = Pr(X„+f ^ X„_^) 


+ Pi'(X„+f — Xn-e) 


Ptjl - Pt) 

Pit 

Piil - Pi) 

I-P 21 


= {P,{1-P,)f{±+ ^ 


P 21 1 — P 21 


{Piil - P,)f 
Piii^ — P 21 ) 


Note that 


Pk = Pl{Xn+k ^ Xn) 

i-E(nr=„V(-i)’^^) 

2 

1 - (1 - 2g)^ 

2 

Substituting (l34l i into (l3?t gives 


MMSE {t) = -j 


i(l_(l_2q)2^)(l + (l-2gn 

1 l-(l-2g) 


21 


4 l + (l-2g)2^’ 


Substituting (l35T l into (l29T l gives 

MMSE^(X) ^^^_(,+i)l-(l-2g)2 


lim 4- 

71^00 Tl 


^E 2 - 




1 + (1 - 2g)2‘+^ 


^E 2 - 


a-(l-2g)2 


(36) 


(29) 


It now only remains to calculate 
MMSE {£) = MMSE(X„|X„_^X„+^) 

= E (P/(X„+f, X„_^)P(f(X„+f, X„_f))) (30) 
where the random variable is defined as 

Pi (^n+£) ^n—f) — Pr(^7T, = — Xn-i-, — Xn-\-i'} 

_ P{^n-\-l — — ^7 ^n—l — 

P(^^n—£ — ^n-\-i — ^n+f) 

_ Pi^n-\-£ ~ ^n-\-£\^n ~ '^)Pi^n ~ ^^n—l ~ ^n—^) 

Pi^n-\-£ — ^n+^l^n—£ — ^n—l) 

for z = 0,1. Let Pk = ^ X^)- With this notation 

we have that if Xn+e ^ Xn-i then 

P({Xn+£,Xn-i) = P^{Xn+£,Xn-£) = ( 3 I) 

P 2 £ 

On the other hand, if Xn+i = Xn-i we have 


l + (l_2q)2‘’ 

and consequently we get the following theorem. 

Theorem 5: Let {Xn} be a first-order Markov process 
with parameter q, {Zn} be an i.i.d. Bernoulli(a) process 
statistically independent of {X„} and Yn = Xn © Zn- Then 

H(y) > h{c) + (1 - Ma)) E + 

Remark 3: Eor every a S (0,1/2) there exist a > 0 such 
that the bound from Theorem |5] outperforms Mrs. Gerber’s 
Lemma for all q G {0,qa)- Eor example, gon Ri 0.212. As 
discussed in the previous section, increases with a and 
approaches 1/2 as a —?> 1 / 2 . 

It will be instructive to study the behavior of the RHS of (l3^ 
in the limit of g —^ 0. To this end we write, for some 0 < 7 < 1 
such that — 7 logg is an integer 

n-loo n Z^ 1 _|_ (^x — 2 g )2 


-7 log 9 

^ E 2 


-tl-(l-2g)^ 


• (32) 


= E 2‘+ig-^(-l)'=(^^j(2g)' 

t —1 \ k—‘2, 

-7 log 9 / 2 ‘ 

> 2-(‘+iM 2*+ig-^(2*)'=(2g)'‘ 


-7logg 




> E 9-2-(‘+')^(2‘+ig) . 


(37) 




(33) 


fc =2 

k P — r^ 


Using the fact that J2k=2 1 !)/ < for 0 < r < 1, 

we further bound dTTl) as 

—7logg /^+_Li \2 


n^oo n 


--ylogq 


(2*+ig)' 

1 - 2‘+ig 


= E 7 - 


2*+ig2 
1 - 2‘+ig 


> - 7 glogg( 1 - , ) ■ (38) 


(34) 


(35) 


1 - 2 gi-T' 

For g — 7 > 0 we can take 7=1 — 1/ ^/ZPk)^ such that 

MMSE^(X) , , W 4 /I 

hm 4- > -glog(g) (l - e ) 

= h{q) (1 - 7 ) (39) 

where ^ 0 as g —0. We have therefore obtained that 

MMSE^(X) MMSE^(X) , 

lim lim 4-—-- = lim lim 4-—-- > 1. 


( 7—>-0 n^oo 


nh{q) 


q^O n —^00 


H{X) 
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Thus, we have seen that while the trivial choice tt' = 
{1, 2...., n} yields MMSE^/(X) that meets the lower bound 
from (l2Tt . the more clever choice of tt given in ( l28l l yields 
MMSE^(X) that meets the upper bound from (l2Tt in the limit. 

Remark 4: The permutation tt from ( l28l l can be found by 
a greedy algorithm that constructs the permutation vector 
sequentially by choosing in the ith step 

TT{i) = argmax MMSE , 

jG[r!,]\{7r(l),...,7i-(i-l)} 

where [n] = n}. The asymptotic optimality of tt 

from (l28l l for symmetric Markov chains may suggest that 
such a greedy algorithm will always yield the permutation 
vector that maximizes MMSE.n.(X). This is, unfortunately, 
not true in general. As a counterexample consider the vector 
X = (Xi,X 2 ) with 

Pr(Xi = 0,X2 = 0) = i ; Pr(Xi = 0,^2 = 1) = 0 
Pr(Xi = 1,X2 = 0) = e ; Pr(Xi = 1,X2 = 1) = ^ - e 
for which Var(Xi) > Var(X 2 ) but 
Var(X 2 ) +MMSE(Xi|X 2 ) > Var(Xi) + MMSE(X 2 |Xi) 


for e small enough. 

Substituting (l3^ into Theorem |5] gives that for small q 


H{Y) > h{a) + (1 - h{a))h{q){l - Eg). 


(40) 


Note that this bound has an infinite slope at g = 0. This 
is always better than the Cover-Thomas type of bounds 
H{Y) > H{Ym\Ym-i,. ■. ,Yi,Xo) derived in fO] Theorem 
4.5.1] which are always smaller than h{q*'^ * a), where 
g*™ denotes convolving q with itself m times. Both bounds 
evaluate to h{a) at q = 0, but the derivative of the latter is 
finite for any finite m. Thus, for small q our bound is better 
than the Cover-Thomas bound of any order. 

The bound (l40l i is weaker than the best known lower bounds 
on H{Y) in the rare transition regime. Eor example, in ifTOl it 
is shown that H{Y) > h(a) — qlogq, whereas in ifTTl 

this was improved to H{Y) > h{a) + h{q) — Cq for some 
C > 0. However, the two bounds mentioned above are “tailor- 
made” to hidden Markov models, whereas gOll follows from 
applying our generic bound from Theorem [T] to the special 
case of a hidden Markov model. In the next subsection we 
will show that the scalar version of our MMSE-based bound, 
stated in Lemma[T]can be used to enhance such a “tailor-made” 
bound for Markov chains. 


A. Bound based on the Ordentilch-Weissman Method 

In E. Ordentlich and T. Weissman cleverly observed 
that the entropy rate of a binary symmetric first-order hidden 
Markov process can be expressed as 

^(Y)=e(y^^ (41) 


where the auto-regressive process Wi is defined as 

1 — a 


Wi = Ri In- 


for 


fit) = In 


+ 5,/(IE,_i) 


e\l-q) + q 


(42) 


(43) 


ge* + (1 - g) 

and i.i.d. processes {i?i} and {5'^} statistically independent of 
Wo, with distributions 

1 w.p. \ — a 


Ri = 


-1 


S,. = 


w.p. a 


w.p. 1 — g 
w.p. q 


(44) 


The expectation in (I4TI) is taken under the assumption that Wq 
is distributed according to the (unique) stationary distribution 
of the process {Wi}, and is therefore well-defined. In 1^ . 
upper and lower bounds on H(Y) were derived by analyzing 
the support of the process {Wi}. Here, we apply Lemma [T] in 
order to derive a lower bounds on H(Y). To this end, we set 


X\Wi ^ Bernoulli 


and find a lower bounds on 


MMSE(X|Wi) =E 


„Wi 




Let F = and rj = 1—^, such that e'^‘ = 


We have 


E 




(1 


aWi\2 I 


E) =(l-a)(l-g) 

rjlF 


rjF 


(1 -f r]FY 


-E (1 - a)q 
F a{l - q) 


(1+77/E)2 

F/iq 


[l + F/riY 

= ((1 - a)(l -q) + aq) 

+ ((l-a)g-Ea(l-g)) 


aq 


i^HvF) 

(1 + l/(pF))2 

rjF 


= {1 — a* q) 

= 9iF), 


rjF 


(1 + rjF)'^ 


[l + rjFf 
Fjr] 

(l + F/p)2 
+ {a* q) 


(45) 


F/v 


where we have used the fact that = e + 


(l+F/p)2 

(46) 

„-Wi 


o-Wi'^2 

F. Clearly, 


e in dull. Let S be the support of the random variable 


MMSE(X|Wi) = Kg{F) > ming(s) 

s^S 


(47) 


In ||6] eq. (44-45)] it is shown that S C [1/E]nax, Emax], where 


^ A iv - 1)(1 - 9) + \/4pg2 + iv- 1)^(1 - 7)^ ,aox 

^ max — f-v • j 


2rjq 


Let 


giiF) ^ and g,{F) 4 


A Fh 


and note that 

52(1/E) = gi(E) and that g{F) = (1 - a * g)gi(E) + {a* 
q)g 2 {F). Eor F > 1 we have that g 2 (E) > gi(F), whereas for 
F < 1 we have that gi(F) > g 2 {F). Since (1 —a*g) > {a*q) 
(recall that we assume a,q< 1/2), we must have that 


min q(s) = min g(s). 

se[l.i4.ax] 


(49) 





















Straightforward algebra gives 
sign ( 5 '(s)) 


1 


= sign ( (?7 - s)(l + - 1)(?7 + s)^ ] . 

\ Q * g J 

Note that sign(g'(l)) = —1, and therefore if the equation 
sign((/'(s)) = 0 does not have any real solution in [1, J^max) 
then we must have 


min 

S £ [ 1 /-fmiui;-fmax ] 


di^) — 5('^max)- 


(50) 


Otherwise, g{s) is obtained either in one of 

the solutions of sign(g'(s)) = 0 in the interval [IjFmax), or 
in Fmax- The equation sign( 5 '(s)) = 0 is equivalent to 


V 


1 — a* q 


1 


+ 377 
+ 


a * q 

1 — 2{a * q) 




a * q 


V s 


a* q 


W - 1)»" 




a* q 


a ^ q 


l-a*q 2 

s -ri \l-\ - T] 

' a * g 


= 0, 

(51) 


Let S* be the set of solutions to the equation (fSTl i in [1, Fmax)- 
We conclude that MMSE(Jf |Wi) > g{F*) where 


F* = argmin g{s). (52) 

SG((S*U-F'nitix) 

and this combined with (l4Tli and Lemma [T| yields the follow¬ 
ing. 

Theorem 6: Let {2f„} be a first-order Markov process 
with parameter q, {Zn} be an i.i.d. Bernoulli(a) process 
statistically independent of {2f„} and Then 

H{Y) > h{a *q) + (l- h{a * q)) giF*), 

where F* is defined by (l48l l. (ISTT l and (l52l l. g{-) is defined 
in (l46l l. and 77 = 

In Figure[3we plot the bound from Theorem| 6 ]for a = 0.11 
and q G [0,0.5]. For comparison, we also plot the lower bound 
from in Corollary 4.8 and Lemma 4.10], and it is seen that for 
small values of q our new bound improves upon that of 0 . 
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Fig. 3. Comparison between the lower bound from Theorem[^and the lower 
bound from 0 Corollary 4.8 and Lemma 4.10] for a = 0.11 and q ranging 
between 0 and 
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